Hardware implementation of Bayesian network based on two-dimensional memtransistors

Bayesian networks (BNs) find widespread application in many real-world probabilistic problems including diagnostics, forecasting, computer vision, etc. The basic computing primitive for BNs is a stochastic bit (s-bit) generator that can control the probability of obtaining ‘1’ in a binary bit-stream. While silicon-based complementary metal-oxide-semiconductor (CMOS) technology can be used for hardware implementation of BNs, the lack of inherent stochasticity makes it area and energy inefficient. On the other hand, memristors and spintronic devices offer inherent stochasticity but lack computing ability beyond simple vector matrix multiplication due to their two-terminal nature and rely on extensive CMOS peripherals for BN implementation, which limits area and energy efficiency. Here, we circumvent these challenges by introducing a hardware platform based on 2D memtransistors. First, we experimentally demonstrate a low-power and compact s-bit generator circuit that exploits cycle-to-cycle fluctuation in the post-programmed conductance state of 2D memtransistors. Next, the s-bit generators are monolithically integrated with 2D memtransistor-based logic gates to implement BNs. Our findings highlight the potential for 2D memtransistor-based integrated circuits for non-von Neumann computing applications.


Reviewer #1 (Remarks to the Author):
This paper proposes a hardware acceleration of BN using a monolithic memtransistor technology based on two-dimensional semiconductors. This technique is a little novel; however, I do not think it can be accepted by this journal. The specific reasons are as follows： 1. The article seems to be a technical improvement in order to complete a task, and the necessary theoretical support is lack.
We would like to thank the reviewer for acknowledging our efforts although we struggle to understand what it means when the reviewer states that our work is "little novel". Reviewer's assessment appears more subjective than being based on objective reasoning. The key accomplishment of our work are 1) experimental demonstration of a novel, low-power, and compact s-bit generator circuit based on 2D memtransistor and 2) monolithic integration of 2D memtransistor-based s-bit generator with 2D memtransistor-based logic gates to achieve hardware acceleration of BN, which we feel are significant and worthy of reporting at Nature Communications.
We also wonder what the reviewer means by saying that necessary theoretical support is lacking.
The concept of Bayesian network (BN) and stochastic computing (SC) have been extensively studied in computational neuroscience and theoretical computer science. The theoretical basis of BN is widely known and well understood. What is lacking in the community is the experimental demonstration of such computing paradigm based on novel materials, devices, and phenomenon, which can mitigate the energy and area overhead challenges. This is precisely the objective of our work.
2. The effect of hardware acceleration is good; however, the improvement is not significant compared to prior art. If the study is specific to existing technological advances, the authors should point out.
We would like to thank the reviewer for his appreciation of the hardware acceleration of BN.
However, we respectfully disagree with their statement that the improvements are not significant compared to prior art. For example, probabilistic CMOS based implementation of BN consumes 2 ~ 150 pJ energy and requires multiple hardware elements including amplifiers, inverters, flip-flops etc. to generate the s-bits, which increases the transistor count to more than 100 making it energy area inefficient [1]. Similarly, BN acceleration using field programmable gate arrays (FPGA) [2][3][4] also require large number of transistors that consume significant amount of energy to generate random bits owing to the absence of inherent stochasticity in silicon devices. Stochastic switching in memristors offer an excellent mechanism to generate random bits [5]. While memristors can be aggressively scaled [6], experimentally reported true random number generator (TRNGs) based on memristors [7] have large footprint ~ 25 µm 2 excluding the CMOS peripherals that included one comparator, one AND gate, and two 4-bit counters (>100 CMOS transistors) and the energy consumption was found to be 15 pJ/bit. Finally, stochastic magnetic tunnel junctions (MTJs) can generate s-bits at a frugal energy expenditure of 2 fJ/bit every 1-100 ms. While the footprint of MTJs can be rather small, generation of s-bits require integration with CMOS peripherals such as one resistor, one transistor, and one comparator, which increases the design complexity and negates the energy and area benefits. Furthermore, two-terminal memristor and spintronics devices are incapable of performing logic operations necessitating hybrid design involving CMOS peripheral for full demonstration of BN. In contrast, our s-bit generator comprises of only 6 2D memtransistors and consumes only 2 pJ/bit and we also eliminate the need for hybrid design since three-terminal memtransistors can be utilized as logic elements.

A large number of information in the graph lacks necessary explanation, and the simple
accumulation results make the readability of this paper extremely poor.
We apologize for lack of explanation on figures. We have added explanation and improved readability of the manuscript.

4.
There are many mistakes in the article that need to be corrected.
The reviewer's concern is noted. We have carefully read the manuscript and made necessary corrections. We would like to thank the reviewer for his thorough reading and acknowledgment of our work.
We are happy to provide further analysis and evidence on s-bit generation and MUX-based architecture for the Bayesian network (BN). Fig. 1a can be achieved by using s-bit generators to obtain the 's, another = 2n s-bit generators to obtain the CPT, and one N × 1 UX with select lines as shown in Extended Data Fig. 1c." The hardware accelerator for BNs proposed in the literature capture the local statistical dependencies between variables so that the exponentially large space of CPT need not be stored. In the architecture the authors are proposing, the requirement of 2^n s-bit generators to store the CPT would require a prohibitively large hardware resource as n becomes large. Addressing this point appears important for the proposed application in the paper. 4 We completely agree with the reviewer that for most realistic scenarios, where Bayesian networks (BN) can be used, it is highly unlikely that a child node will have a large number of parent nodes ( ) or vice versa. Since BN represents a set of variables and their conditional dependencies via a directed acyclic graph, finding an optimal graphical representation is critical in minimizing the hardware resources necessary for its acceleration. Local statistical dependencies between the variables are often used to simplify the graphical representation for BN [8]. For example, graphs in which the edges are oriented according to a causal theory are generally more efficient [9].

On page 5 the last paragraph, the authors mention that "For example, hardware acceleration of BN in Extended Data
Therefore, the number of s-bit generators that will be required to store the CPT will remain small for all practical purposes. We have revised the Extended Data Fig. 1 and the associated discussion to avoid the confusion for the readers. Reviewer's suggestion is noted. We have included the following discussions on the phenomenon of charge trapping and de-trapping at and near the dielectric/2D interface and the origin of cycleto-cycle variation in the programmed conductance states in the revised manuscript along with relevant references.
The shift in the transfer characteristics of post-programmed and post-erased 2D memtransistors can be explained using the phenomenon of charge trapping and de-trapping at and near the 2D/dielectric interface. Note that trap states can originate from defects/imperfections in the dielectric and/or adsorbed species at the 2D/dielectric interface as reported in various earlier studies [10][11][12]. These states can also be engineered at desired energetic locations by introducing intentional defects in the 2D channel material [13,14]. Carrier occupancy in these trap states follow Fermi-Dirac distribution. As illustrated using the energy band diagrams in Fig. R1, at equilibrium, i.e. in the absence of any gate bias, the trap states with energy levels above the Fermi energy ( ) are empty, whereas the ones below are filled. When the memtransistor is subjected to a negative "Write" ( ) voltage pulse, electrons are released (de-trapped) from these trap states 5 leaving them positively charged. This leads to screen of the back-gate bias, which is reflected as shift in the threshold voltage (∆ ) following Eq. 1.
Where, is the magnitude of effective positive charge at or near the 2D/dielectric interface, and is the capacitance, is the dielectric constant (10 0 , 0 = 8.85×10 -12 F/m 2 ), and = 50 nm is the thickness of the back gate oxide, respectively. Similarly, when the memtransistor is subjected to a positive "Erase" ( ) voltage pulse, electrons are captured back (trapped) into the trap states restoring the threshold voltage. Note that the number of electrons getting trapped/detrapped can be controlled by both the magnitude and duration of and , which allow us to have an analog control of the ∆ or the conductance state of the memtransistor.
The cycle-to cycle variation in program/erase processes is a direct consequence of the stochastic nature of charge trapping and de-trapping observed in most semiconductor/dielectric interfaces [15][16][17]. In the simple two-state model, a trap state can be electrically neutral or charged and can transition between the two states even under equilibrium condition with transition times exponentially distributed. In other words, the state transition dynamics for traps follows classic Markovian process [18,19]. In ultra-scaled metal oxide semiconductor field effect transistors (MOSFETs) such stochastic state transitions lead to random telegraph noise (RTN). Often metastable states are also involved in the trapping/detrapping processes making the transition Figure R1. Energy band diagram explaining the charge trapping phenomena while programming the 2D memtransistor.
6 dynamic more complex, rich, and at the same time introducing additional source of randomness [20]. While RTN is not observed in our relatively large area memtransistors, the stochasticity of trapping/detrapping processes manifest during the program/erase operations leading to the cycleto-cycle variation in ∆ . Note that a detailed discussion on the origin of stochasticity is beyond the scope of this article and interested readers can find more information in the literature. The reviewer has raised an excellent point. The minimum program/erase pulse width is determined by the trapping/detrapping time constants, which can be as short as several hundreds of results, we can conclude that, for any given pulse magnitude / , ∆ becomes smaller as / becomes shorter. To retain similar ∆ for smaller / larger / is required, which will increase the energy expenditure. Therefore, one needs to strike a balance between fast programmability and energy consumption based on the application.

The random number streams generated from one s-bit generator presented in the paper
is around a few hundred ( The reviewer has an excellent point here. The s-bit generator can be used to generate larger set of random bits. We have utilized the s-bit generator to generate 10 4 random bits using the same programming and erase voltage pulses of = 10 V and = -7 V, respectively, and / = 100 µs. with the null hypothesis that the sequence is random with 99% confidence level. It is evident from the results that the s-bits generated are truly random. i.e., 5 is shown in Fig. R3b. Note that the series connection of memtransistors, 1 and 2 represents a voltage divider circuit, and hence 5 is determined by their respective conductance values, i.e., 1 and 2 . Since 1 fluctuates from cycle-to-cycle owing to programming and reset voltages applied to its local back-gate terminal, i.e., 2, so does 5 . In other words, the voltage divider translates conductance fluctuation into voltage fluctuation. Here, we have used 10 4  programming/erase/read cycles to obtain larger data set for cycle-to-cycle variation in 5 . Clearly, the distribution fits well with Gaussian profile. While, it is difficult to explain why the conductance distribution is expected to be Gaussian, the experimental results tend to fit well with such distribution. Similar observation is made in the literature on other material systems that involve charge trapping/detrapping. For example, the random telegraph noise (RTN) in scaled resistive random access memory (RRAM) devices originating from activation/deactivation of the electron traps in the filament also leads to conductance fluctuation that follows Gaussian distribution [21]. RTN in small semiconductor devices also lead to conductance fluctuation that follows Gaussian distribution [22]. However, there are evidence of skewed non-Gaussian distributions originating from RTN [23]. We would invest in detailed theoretical modeling of the phenomenon in the future to explain the origin of Gaussian distribution. Fig. 3 m, n is achieved through the non-volatile programmability of VIT. How would the VIT be programmed deterministically in a real application, without being affected by the same cycle-to-cycle variation?

The tunable probability shown in
The reviewer has raised an excellent point. We agree that the cycle-to-cycle variation in the programming of 2D memtransistors will lead to fluctuations in the threshold voltage ( ) of and hence in of the thresholding inverter and for the s-bit stream. Fig. R4a-b, respectively, show the distribution of and when 6 is subjected to 50 program/erase/read cycles with = -7V, = 10 V and / = 100 µs. The mean and standard deviation values were found to be -0.04 V and 0.08 V for , 0.14 V and 0.08 V for . This leads to variation in the as shown as a band of uncertainty around the mean value of in Fig. R4c. Therefore, we cannot claim that will be perfectly deterministic, instead there will be small uncertainty in its value.
We have revised Fig. 3n and included this uncertainty band and commented on the same in the revised manuscript.
Additional comments: 1. The information provided in the first paragraph of the introduction and especially the abstract appears a bit generic and could be focused more on the results of the study.
We agree with the reviewer's suggestions. We have revised the abstract and introduction accordingly. Fig. 2c seems to be incorrect (probably missing a 1e-6 factor).

The linear y-axis for
We are sorry for the typo. We have corrected the labels for Fig. 2c. The reviewer is absolutely correct that the s-bit generators, in principle, can be realized using any three-terminal memory devices that show cycle-to-cycle programming variation. In fact, commercial silicon NAND Flash memory devices have been explored as high quality true random number generators (TRNG) [24,25]. However, instead of cycle-to-cycle programming variation, NAND based TRNGs exploit thermal noise, random telegraph noise (RTN) and device-to-device variation in the threshold voltage. Furthermore, a digital interface involving silicon CMOS-based peripherals is also necessary for converting the analog fluctuations into s-bits with reconfigurable probability of obtaining '1's or '0's in a given bit stream. Therefore, despite their maturity, the NAND flash and CMOS technology based on silicon do not offer a monolithically solution today. This is the so-called von Neumann bottleneck, which highlights the importance of 'in-memory compute' for energy and hardware efficient acceleration of emerging computing paradigms including stochastic computing, brain-inspired neuromorphic computing, etc. And our demonstration of compact s-bit generator based on only six 2D memtransistors reinforces this claim.
The choice of large-area grown monolayer MoS2 is motivated by the fact that atomically thin 2D materials are being seriously considered by the semiconductor industry (for example, companies like TSMC and Intel) for advanced technology nodes [26]. It is widely accepted that scaling silicon thickness beyond ~ 3-4 nm is challenging. Yet, the gate electrostatics demand aggressive reduction in the channel thickness to preserve the desired device performance for sub-10 nm technology 12 nodes [27]. The ultimate channel thickness that one can envision for a field-effect transistor (FET) would be in the sub-1 nm range, which is not readily accessible for any three-dimensional (3D) semiconducting crystal due to increased scattering of charge carriers at the channel-to-dielectric interfaces that results in severe mobility degradation [28]. This opens up opportunities for semiconducting two-dimensional (2D) materials, which are naturally thin with monolayers having sub-1 nm (~ 0.6 nm) body thicknesses in case of transition metal dichalcogenides (TMDs) [29][30][31][32][33][34][35].
Based on the above discussion, our demonstration of a standalone hardware platform exploiting 2D memtransistors not only shows the promise of 'in-memory compute' for energy and hardware efficient acceleration of novel computing paradigms but also highlights the long-term benefits to the semiconductor eco-system and road to resolve grad challenges that exists with the aging silicon technology. However, we also acknowledge that a significant amount of work will be necessary before 2D materials are introduced in commercial products and there are scopes for improvement and optimization of the 2D memtransistor technology. For example, large-area growth [26] and large-area transfer [35] of 2D materials must be perfected to minimize growth defects and damages caused due to transfer to ensure high yield during device fabrication. Finally, the programmability of scaled 2D memtransistors must be investigated to ensure that these devices can meet the requirements set forth by the International Roadmap for Devices and Systems (IRDS 2028).

2) In this work, cycle-to-cycle variation is used as randomness source, and I wonder if it
would require too many cycles, resulting in a too early computation failure. It looks like a 13 device should undergo program/erase per every clock, which means if the clock speed is 1GHz (lower than very conventional processing unit speed 2-3GHz), there should be 10^9 times of switching. This implies that the proposed device must have pseudo-infinite endurance characteristics like DRAM and unlike flash device. Honestly, it doesn't matter how fast the clock speed is, and it will be a critical issue even with kHz clock. However, the device in this work looks more like flash device and please discuss its feasibility considering endurance characteristics.
The reviewer's concern of endurance of the device is completely valid and we do agree that it is unlikely that the 2D memtransistor will have pseudo-infinite endurance characteristics like the DRAM. We have now conducted endurance experiments on our MoS2 memtransistor with the gate voltage cycle of = -7 V and = 10 V with / = 100 ns up to 10 9 cycles. Fig. R5 shows the post-program and post-erase conductance measured at BG = 0 V for up to 10 9 endurance cycles. There is no significant change in the two states. While, we will continue to test the endurance of our memtransistor for higher number of cycles in our future studies, we think for the applications that we sought for, i.e., edge computing, our technology will still be extremely useful. Edge applications significantly reduce endurance requirements to achieve energy and resource efficiency. For example, in weather forecasting, the BN will be used every minute rather than every microseconds, similarly, in medical diagnostics, BN will be used only several thousand times a day to assess patients.  14 The reviewer has raised a valid concern. It is indeed possible that the distribution of the output voltage ( 5 ) of the divider circuit constructed using 1 and 2 may not necessarily follow a perfect Gaussian distribution even after the clock is applied for enough cycles. This will definitely lead to computation error. To access the impact of skewed distribution on the precision of the BN, we have performed simulations using MATLAB assuming that 6 , i.e., the output of the inverting amplifier of the s-bit generator circuit follows the Pearson random distribution function. Fig. R6a shows the distribution of 6 for different values of skewness from -1 to 1 in steps of 0.5. Fig.   R6b shows the corresponding as a function of . As the skewness increases, the deviation of from its expected value also increases. Fig. R6c shows the colormap of the percentage error in estimating ( ) using the BN accelerator for different skewness in the stochastic input variables 1 and 2 that represent ( / ) and ( / ), respectively. As expected, the percentage error increases with increasing skewness. We have added the above discussion in the Extended Data  We are glad that the reviewer is satisfied with our revision. We would also like to thank the reviewer for recommending publication of our work in Nature Communications. We are happy to provide more details regarding the reviewer's remaining questions/suggestions.

The added explanation of the origin of cycle-to-cycle variation based on the charge trapping/de-trapping process is appreciated and is a welcome addition to the discussions provided in the paper.
We thank the reviewer for his appreciation of our revision. Fig. 1  We thank the reviewer for his appreciation of our revision. Fig. R2 is appreciated. In the same context, the authors mention that the minimum program/erase time that is dependent on the trapping/de-trapping time constants can be as short as several hundred picoseconds.

Regarding the speed of operation of the s-bit generators, the study on pulse width dependence of transfer characteristics up to 100ns presented in
It is requested that the authors provide references in support of this claim in the interest of the reader. 2 We agree with the reviewer's suggestion. We have added the following references in the revised manuscript.

The generation of 10 4 random numbers and the associated NIST STS test results (a subset
of all the recommended tests) show that the device has the potential to produce high quality random numbers. Although not critical, one question that remains is why not generate more random numbers to complete more tests in the NIST suite, especially since the endurance has been shown to be much higher than 10 4 . Clarifying this would position the potential of the device as an s-bit generator in a clearer sense.
We appreciate reviewer's comment. Note that, in order to run the entire set of tests in the NIST suite, we need to generate > 10 9 random bits. While possible, it will require unrealistically long time with our present measurement setup due to limitation in the number of data points (~1000) that can be collected in a single experiment when the Keysight B1500 semiconductor parameter analyzer is used in the sampling mode. Repeating the experiment 1 million times is a herculean task limiting our current capability to collect > 10 9 bits. We are working towards automating the data collection and also communicating with the tool manufacturer to enhance the capability. We can assure the reviewer that in our future demonstrations, we will definitely include larger data sets. Fig. R4 on the variation in pS is appreciated. However, it is not clear if this variation in the programmed probability values will affect the proper operation of the Bayesian network built out of these 3 devices. A discussion on this seems to be important for supporting the claim of these devices being hardware accelerators for Bayesian networks.

The tunability of probability achieved through non-volatile programmability of VIT is not deterministic as has now been clarified by the authors. The associated plots in
We agree with the reviewer that a discussion on the impact of variation in the programmed probability value on the operation of the Bayesian network (BN) must be included in the revised manuscript. We have now performed numerical simulation to show how the error (∆ ) in the programmed probability impacts the accuracy of the output ( = 1 + 2 ) of the BN. However, I believe that the authors' response regarding the motivation of MoS2 should be included in Introduction. I think that Introduction is biased too much to Bayesian networks.
In addition, I think it would be better to provide how likely to happen for each skewness. For example, skewness = -1 looks like very unlikely to happen, but the readers cannot know how frequently this situation will happen. Perhaps, skewness vs (frequency or probability) plot will be helpful to understand this issue. If the authors can come up with some better plot or data for this concern, it will be also fine.
We are glad that the reviewer finds our revision satisfactory. We are happy to include more discussion on the motivation behind the use of MoS2 based memtransistor in the Introduction section. We also agree with the reviewer that skewness = -1 is very unlike to happen. We have now mentioned that in the revised manuscript. We have refrained from including a skewness vs frequency or probability plot as this plot will be mostly hypothetical. We will contemplate if there can be a better plot to convey the impact of skewness in the probability distribution on the performance of the Bayesian network in our future studies.

Reviewer #4 (Remarks to the Author):
In this work, Zheng et al. introduce a hardware "acceleration" of a Bayesian network using 2D-material-based memtransistors. The highlight of the paper is an experimental demonstration with 29 memtransistors. However, the paper has important issues.
We would like to first thank the reviewer for acknowledging our efforts on hardware demonstration of a Bayesian network using 2D-material-based memtransistors. We are happy to resolve the issues the reviewer has mentioned.
1. First, the word "accelerator" should not be used. The scheme proposed by the authors involves programming memtransistors repeatedly (to do stochastic computing), which is a slow operation. Bayesian inference is slower with this accelerator than on a computer. I understand that programming memtransistors might become faster in the future, but it is an inherently slow operation due to memtransistor physics, which CMOS does not require.
We agree with the reviewer's suggestion. We have replaced the phrase "hardware acceleration" by "hardware implementation" in the title and text in the revised manuscript. We hope that this change is acceptable to the reviewer. During the third phase (when VN1=VDD), there will be DC current flowing (this is not as in CMOS, where energy consumption stops as soon as the output has stabilized). If the drain current of the transistors is ~50uA, VDD~2V, and the third phase is 100ns (this is my best guess based on the paper), this additional energy consumption is going to be several times 10pJ! In fact, Eq 3 looks like the energy consumption associated with charging transistor MT1, but it is not the right equation (there is a problem with the 1/2 factor, this is a very 6 classic mistake to just think about the stored energy, and not the whole charge/discharge process). The authors should measure the real energy consumption of the circuit. I also recommend talking to an electrical engineer with expertise in circuits, and to do, e.g., SPICE simulation.
The reviewer's observation is correct and we apologize for our oversight regarding the energy consumption. We have now revised the energy calculation for the s-bit generation circuit using Eq. R1.
In Eq. R1, , , , and , are the program, erase, read , and supply voltages, respectively, ≈ 10 -14 F is the gate capacitance, 0 = 8.85 × 10 −12 / is the vacuum permittivity, = 10, and = 50 are, respectively, the relative permittivity and thickness of Al2O3, and = 5 µm and = 1 µm are, respectively, the channel width and length of the 2D-memtrasistor. 〈 1 4 〉 is the average current flowing through the s-bit generator circuit, which is the total current through the voltage divider, inverting amplifier, and threshold inverter during each . We have used = 200 to calculate the average current per = 100 µs based on the experimental measurements, as shown in Fig. R2. Since most of the memtransistors operate in their respective subthreshold regimes, 〈 1 4 〉 ~ 1.5 nA. As such the second term in Eq. R1a accounts for ~ 0.3 pJ, whereas the first term in Eq. R1a accounts for ~ 2 pJ. This results in −~ 2 pJ/clock-cycle, which supports our claim on energy efficient s-bit generation. We feel that SPICE simulation is beyond the scope of this work and will be included in our future studies. small device-to-device variation in our MoS2 2D memtransistors, which supports proper operation of hardware Bayesian network with our 2D memtransistors.
The impact of device-to-device variation on the operation of the Bayesian network (BN) have been examined through numerical simulation to show how the error (∆ ) in the programmed probability due to device-to-device variation impacts the accuracy of the output ( = 1 + 2 ) of the BN.  the BN. We have included these data and impact of device-to-device variation in the Supplementary Information Fig. S13-S14. We agree with the reviewer's suggestion. The references regarding spintronic approaches based on superparamagnetic tunnel junctions with sense amplifiers is included in the revised manuscript.

Concerning spintronics
6. Finally, the paper has many imprecisions, e.g.: -"The success of biological brains in implementing BN lie in the inherently stochastic nature of neural computation ». This is a highly debated hypothesis. A correct phrasing could be "could lie in the…" We agree with the reviewer's suggestion. We have corrected the phrasing.
-"Certainly, the energy expense can be reduced by reducing the length of the s-bit streams at the cost of reduced precision". This should be backed by results.
The reviewer has raised a valid point. We would like to point out the fact that the energy consumption is calculated per s-bit and it was found to be ~ 2 pJ. The bit-length used for BN implementation is 200. While it is possible to encode a given probability using a bitstream of lower bit length, to reduce energy expense, this could lead to a significant loss of precision in the computed BN output for very small bit-lengths. In general, as we reduce the bit-length of s-bit streams involved in BN implementation, we observe a deviation in the probability value encoded 10 by that bitstream from the actual value. To support this claim, we have performed a numerical simulation to understand the impact of bit-length reduction on the accuracy of the BN output. As expected, the percentage error of the BN output ( ) increases with a reduction in the bit-length used to encode the probability values ( ) = 0.59, ( / ) = 0.39, ( / ) = 0.75. The expected BN output in this case is ( ) = 0.54. Fig. R5 shows the percentage error of ( ) as a function of bit length. Clearly, the plot reveals that the percentage error increases with the reduction of the bit-length.

"The fundamental computing primitive for BN is a s-bit generator" (and similar phrases).
This should be "The fundamental computing primitive for The stochastic computing implementation of BN" (BN classically do not require s-bits).
The agree with the reviewer's suggestion. We have corrected the phrasing.

-Fig 3f does not have a legend on the x-axis
Thanks for pointing out the missing level, we have fixed it in the revised manuscript.